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1.  INTRODUCTION 


The  pooling  of  variance  estimates,  obtained  under  different  experimental 
conditions,  is  a  classical  problem  in  statistical  inference,  The  commonly 
used  formulas  for  pooling  in  the  normal  case  are  based  on  the  assumption 
that  the  samples  are  independent  and  the  population  variances  are  equal 
or  their  ratios  are  known.  Previous  studies  were  mainly  oriented  to  the 
pooling  problem  when  the  assumption  about  the  equality  of  variances  is 
questionable.  However,  little  is  known  concerning  the  situation  when  the 
samples  may  not  be  independent.  In  the  present  study  we  address  the 
problem  of  estimating  the  common  variance  of  several  normal  populations 
which  may  be  correlated.  More  specifically  we  consider  a  model  of  several 

independent  replicas  from  a  multivariate  normal  distribution  having  a 

2 

covariance  matrix  of  the  form  a  £  where  £  is  a  correlation  matrix  for 

2 

equi correlated  random  variables,  with  unknown  variance,  o  ,  and  correlation 
coefficient,  p.  To  simplify  we  present  the  development  under  the  assumption 
that  the  means  are  known.  Straightforward  generalization  to  the  case  of 
unknown  means  can  be  given  for  practical  applications  by  substituting  their 
maximum  likelihood  estimates  and  modifying  the  number  of  degrees  of  freedom. 
However,  the  theoretical  problem  of  admissibility  of  the  maximum  likelihood 
procedures,  as  indicated  by  Stein  [7],  arises  and  Stein  type  estimators  can 
be  attempted. 

We  investigate  the  problem  of  determining  efficient  confidence  intervals 
2 

for  p  and  o  .  Confidence  intervals  for  p  were  previously  determined  by 
Olkin  and  Pratt  [6].  Nontrivial  confidence  intervals  for  o  ,  when  p 
is  unknown,  are  more  difficult  to  obtain.  There  exists  no  uniformly  most 
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2 

accurate  invariant  (UMAl)  confidence  interval  for  a  since  p  is  a 

nuisance  paramter.  In  the  present  study  we  develop  interval  estimators  for 
2 

a  ,  which  are  asymptotically  most  efficient,  by  employing  the  ’i'st 
asymptotic  normality, (ban)> of  the  maximum  likelihood  estimators  (see  Zacks 
[11]  p. 2hk).  The  problem  is  to  evaluate  the  coverage  probabilities  of 
these  MLE  based  Interval  estimators  in  small  or  medium  size  samples.  Exact 
coverage  probabilities  were  determined  by  employing  distributional  properties 
of  the  statistics  involved  and  some  numerical  integrations.  The  results 
indicate  that  the  actual  coverage  probabilities  of  the  intervals  proposed 
are  close  to  the  nominal  ones,  when  the  number  of  replicas  is  at  least  20. 

For  smaller  samples  the  coverage  probabilities  may  show  greater  deviation 
from  the  nominal  ones,  especially  when  |p|  is  close  to  one.  We  have  there¬ 
fore  investigated  the  properties  of  alternative  estimators  in  small  sample 
situations. 

2 

When  p  is  known  one  can  construct  UMAI  confidence  intervals  for  o  . 

The  statistic  used  in  this  case  suggests  a  system  of  confidence  intervals 
in  which  proper  estimates  are  substituted  for  the  unknown  p.  This  type 
of  intervals  will  be  called  "estimated-p  intervals”.  In  order  to  secure 
the  required  coverage  probability  one  may  attempt  to  apply  the  Bonferroni 
inequality  and  use  confidence  limits  for  p,  rather  than  point  estimates. 

We  show  that  such  an  approach  results  in  very  inefficient  confidence  intervals. 


By  trial  and  error  the  development  of  more  efficient  intervals  of  this  type 
might  be  possible.  More  research  should  be  performed  in  order  to  derive 
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compare  the  MLE  based  confidence  intervals  with  "naive"  type  of  intervals, 

o 

which  employs  in  a  non- optimal  fashion  the  information  on  a  obtained 
from  the  individual  samples.  We  show  in  the  present  paper  that  despite  the 
slight  smal 1  sample  deficiencies  of  the  MLE  based  confidence  intervals, 
they  are  generally  more  efficient  than  the  other  types  of  intervals  considered 
here.  For  this  purpose  we  introduce  a  measure  of  relative  efficiency  which  is 
based  on  the  actual  coverage  probability  and  the  expected  length  of  the 
intervals. 

Hie  determination  of  the  actual  coverage  probabilities  and  the  expected 
length  is  generally  a  difficult  matter.  We  provide  a  method  for  exact 
numerical  determination  in  the  case  of  the  MLE  based  intervals.  For  the 
"naive"  type  intervals  an  explicit  formula  for  the  expected  length  has  been 
derived.  For  evaluating  the  efficiency  of  the  " estimated- p  type  intervals" 
we  have  employed  the  Monte  Carlo  method.  Finally,  we  have  investigated  the 
loss  in  actual  coverage  incurred  by  ignoring  the  possibility  of  correlation 
and  employing  the  usual  confidence  intervals  for  the  case  of  p  =  0. 

Hiere  are  many  practical  problems  for  which  the  above  model  of  equi correlated 
homoscedastic  observations  applies.  For  example,  the  compressive  strength 
of  concrete  cubes  from  the  same  batch  are  generally  correlated.  Since  a 
batch  is  homogeneously  mixed,  all  the  cubes  from  the  same  batch  will  have 
strength  values  with  equal  variances  and  the  correlation  between  any  pair  of 
cubes  will  be  the  same.  Cubes  prepared  from  different  batches  will  generally 
be  independent  and  will  have  the  same  variance  if  the  manufacturing  conditions 
are  similar. 

A  frequent  application  of  the  equicorrelated,  equal  variance  model  is 
outlined  by  Winer  [9]  for  the  one-way  repeated  measures  design. 


2.  THE  MODEL,  THE  LIKELIHOOD  FUNCTION  AND 
THE  FISHER  INFORMATION 

Let  be  an  mxl  random  vector  having  an  equicorrelated  multinormal 
distribution  with  zero  mean  vector  and  covariance  matrix 


?  =  o2[(l-p)l  +  pJ], 


(2.1) 


2  «-l 

0  <  a  <  <*>,  -(m  -  l)  <  p  <  lj  where  m  2  2,  and  J  =  11’  where 
i*  =  •••»!)•  Consider  the  Helmert  transformation  Y  =  HX  where 

H  is  an  orthogonal  matrix  with  first  row  vector  equal  tora'^L'  (see 
Kendall  and  Buckland  [3]  p.126).  The  components  of  Tf  are  independent 
normal  random  variables,  with  zero  means  and  variances 


VarfYj)  = 

]a2(l  -  p) 


3  =  1 

J  5  Zy  •  •  •  y  Cl* 


(2.2) 


Given  n  independent  identically  distributed  observations  on  X„,...,X  , 

~17  7~ir 

o 

the  likelihood  function  of  (o  ,p)  can  be  expressed  in  terms  of  the  £ 


vectors  as 


X^Yg,...,  Xj  =  c(a2)"mn/2(l  +  (m-l)p)"n/2  • 

(i-Pr(m"1)n/2exP{-(2c2r1[(i  +  (m-ijpr^sx^  (2  3) 


,  n  m  0 

.(l-pr1^  s  y?,]). 

i=l  j=2 


Define 


II  _ 

«x  -  E  4 

x  i=l 


n  m 

and  o  =  £  2  Y2  . 

2  i=l  j=2 


It  follows  that  and  are  independent  and  constitute  complete 
sufficient  statistics  for  the  present  model  (see  Lehmann  [4]  p.132). 


•  m  ■ 


Furthermore,  let  x  (vj  designate  a  chi-squared  r.v.  with  v 
freedom,  then 

^  ~  o2(l  +  (ro-l)p)  x'[n3 
and 


Hie  log- likelihood  (sample  information)  function  can  then  be  expressed  as 


Hie  score  functions  are  given  by 


■  -mn/(2o2)+Q1/[2o1*(l+(m-l)p)  ]+Q2/[2o1* (l-p) ] 
-n(m-l)/[2(l  +(m-l)p) ]  +  n(m-l)/[2(l-p)]  (2.6) 

(»-l)Q1/[2o2(l+(m-l)p)2]-Q2/[2o2(l-p)2]  . 


Hie  maximum  likelihood  estimator  of  £  =  (o  , p)  is  obtained  fr.-tu  (2.6) 
as  the  roots  of  the  system  of  equations  S^(o  ,p)«  0}  S2(o  ,p)H  0. 

These  MLE’s  are 


Since  the  present  model  satisfies  the  Cramer- Rao  regularity  conditions 


0  and  the  variances 


and  covariances  of  the  score  functions  are  given  by 


(2.8) 


Var(S^)  =  nm/(2 a  ), 

Var(S2)  =  n(m-l)/2(((m-l)(l-p)2  +  (1  +  (m-l)p)2)  * 

((l-p)2(l  +(m-l)p)2)-1) 

Cov(S1#  S2)  =  -n(ra-l)mp/r2o2(l-p)(l  +(m-l)p)]. 


Accordingly,  the  Fisher  information  matrix  is 


l(o2,p)  = 


and  its  inverse  is 


Var(S1)  CovCS^Sg) 


Var(S2) 


(2.9) 


I_1(o2,p)  = 


2o^(l  +(m-l)p2)/(nm) 


2o  p(l-p)(l+(m-l)p)/(nm) 
2(l-p)2(l+(m-l)p)2/(nm(m-l)) 


Hence  the  asymptotic  variance  of  any  asymptotically  normal  estimator  of 
o2  is  at  least  2o\l+(m-l)p2)/(nm). 


3. 


The  MLE  of 
(UMVU)  having  a 


POINT  AND  INVERVAL  ESTIMATORS  BASED  ON  THE  MLE 

a  given  in  (2.7)  is  uniformly  minimum  variance  unbiased 
variance 


V(o2)  =  2o1*(l+(m-l)p2)/(mn).  (3.l) 

This  variance  coincides  with  the  asymptotic  variance  of  the  MLE,  which 
is  presented  in  £  \c2,p).  Furthermore,  is  best  asymptotically  normal 

estimator  (BAN). 
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Since  and  have  gamma  distributions  with  different  scale  parameters, 

*p 

their  sum  does  not  have  a  gamma  distribution.  The  exact  distribution  of  o 
can  be  expressed  as  a  mixture  of  gamma  distributions  (see  Neuts  and  Zacks 
[5]).  This  distribution  is,  however,  too  complicated  to  provide  a  simple 
system  of  confidence  intervals  when  p  is  unknown.  We  therefore  construct 
interval  estimators,  which  are  based  on  the  asymptotic  normality  of  the 
MLE's  with  the  aim  of  attaining  a  (1  -a)  coverage  probability.  These  interval 
estimators  are  based  on  the  fact  that  when  n  is  large  then 
A2  2 

PC  1— g~-  1  s  O  ~  1  -  a  (3.2) 

o  t(p)  Y  " 

where  y  =  1  -  a/2,  z^  is  the  Y  -fractile  of  the  standard  normal 
distribution  and 

6(f)  =  (2(1  +(m-l)f2)/mn]*.  (3.3) 

Prom  (3.2)  one  obtains  the  system  of  interval  estimators  of  the  form 
(^/b^p),  o2/a^(f)),  where  a^d)  =  max(0,  1  -  z^  6(p))  and  b^(f)  =  1  + 

*Y  t(f).  Notice  that  in  small  samples  6(f)  could  exceed  one  with 

positive  probability.  This  is  not  the  problem  in  large  samples  since,  as 

follows  from  (3.3),  6(f)  <  (2/n)^.  Thus,  in  small  samples,  the  above  system 

may  have  a  very  large  or  infinite  expected  length.  In  order  to  overcome  this 

difficulty,  we  propose  a  modified  system  of  MLE  based  interval  estimators, 

namely  (o^f),  c2By(f)),  where  Ay(p)  =  max^l-t^vj]  6  (f))  and 

BY(p)  *  1  ♦  tY[v*]  6  (p).  ty[Vg]  is  the  Y- fractile  of  the  t-distribution 

with  v*  degrees  of  freedom  where  v*  is  taken  as  the  integer  part  of 

n(l  ♦  (m-l)(l-p2)).  We  have  replaced  z  by  t  [v.]  in  order  to 

t  Y  p 

compensate  for  interchanging  the  roles  of  o2  and  o2  in  (3.2).  Thus, the 


MLK  based  confidence  intervals  are  of  the  form 


(o  /\(9), 


(3.4) 


where 


*  ,A. 

Av  (P) 


bv(p) 

V\<?) 


\(?) 


MLE  intervals 
Modified-MLE  intervals 

MLE  intervals 


(5.5) 


\ (p)  ii/hv(PA)  , 


(5.6) 


Y *rr  t  Modified-MLE  intervals. 

We  conclude  the  present  section  with  a  comment  on  confidence  interval 
for  p.  From  (2.4)  it  is  immediately  implied  that 


^1  _  1  +  ( in-  l)o  ,  n 

S - F[ntn(«-1)] 


(5.7) 


where  Ftv^Vg)  designates  an  F- statistic  with  v1  and  v„  degrees  of 
freedom.  UMAI  confidence  interval  for  p  can  be  directly-  obtained  from 
(3.7)  (see  Lehmann  [4]). 


4.  DETERMINATION  OF  THE  EXACT  COVERAGE  PROBABILITY 
AND  EXPECTED  LENGTH  OF  THE  MLE  BASED  INTERVALS 

From  (2.4)  and  the  independence  of  and  one  deduces  that. 
Q^/(l  +(m-l)p) 


^/(l  +(m-l)p)  +  Q„/(l-p) 


^  B(n/2,n(m-l)/2) 


(4.1) 


where  3(p, q)  designates  a  random  variable  having  a  beta  distribution  with 


paramters  p,q.  For  the  purpose  of  clearly  distinguishing  between  p  and 
p  in  what  follows,  we  replace  p  by  R.  According  to  (2.7) 

l 

S/(Q1  +  =  (l  (4.2) 

and  it  follows  that  the  left  hand  side  of  (U . l)  is  equivalent  to 

«p(Rj  p)  = _ (l-p)(l  +  (m-l)R) _  .  (4.3) 

(l-p)(l  +  (in— 1  )r)  +  (1  +(m-l)p)  (1-R) 

Accordingly,  the  probability  distribution  of  R,  for  each  p,  can  be 

determined  from  (4.l)  and  (4.3).  In  addition,  it  is  easy  to  verify  that 

P(R  <  -(o-l)’1)  =  0. 

Consider  the  random  variable 

q(p)  =  (i  +(a.i)Pr1o1  +(i-pr1o2. 

The  distribution  of  Q(p)  is  like  that  of  o2  x2[®nJ* 

Algebraic  manipulations  lead  to  the  expression 


That  is,  Q(p)  has  been  factored  into  the  product  of  the  MLE  a  and  the 

function  Y(R;  p).  (Notice  that  for  R  Js  -  (m-l)"1,  Y(Rj  p)i0.) 

We  show  now  that  R  is  independent  of  Q(p).  Fix  p  and  consider 
the  subfamily  of  distributions  of  (Q^,Q^)>  ?  ,  depending  on  the  scale 

parameter  a  .  The  random  variable  Q(p)  is  a  complete  sufficient  statistic 
for  By  Basu's  theorem  [1]  Q(p)  and  R  are  independent  since  R  is 

ancillary  (invariant  with  respect  to  the  group  of  scale  transformation). 


Hence,  Q(p)  is  independent  of  Y(R;  p).  It  follows  that  the  conditional 


distribution  of  o  Y(R;  p),  given  R,  is  like  that  of  the  marginal  distribution 

2  2  A2 

of  Q(p),  namely  that  of  o  x  [nm].  Thus,  given  R  =  r,  c‘J  is  distributed 
2  2 

like  (o  A(rj  P))X  [  ran].  This  result  is  the  basis  for  the  determination 


of  the  coverage  probabilities  and  expected  length 


Hie  coverage  probability,  under  o  and  p,  of  the  intervals  defined 


Hiis  coverage  probability,  CP,  can  be  determined  according  to  the  previous 


result  and  the  law  of  iterated  expectations  by 


Notice  that  the  coverage  probability  is  a  function  of  p  only 


Let  pos(hlX)  designate  the  c.d. f.  of  a  Poisson  distribution  with  mean 


X.  If  mn  is  an  even  integer,  the  c.d. f.  of  the  distribution  of  x  [nm] 
can  be  determined  by  Pos(k|\)  according  to  the  well  known  relationship 


Hence 


where 

X^r;  p)  =  feY*(r)  Y(ri  p) 

and  (4.9) 

X2(r;  p)  =  fcA^*(r)  *(r;  p). 

As  shown  earlier,  the  function  cp(R;  p),  given  by  (4.3),  is  distributed 
like  0(n/2,  n(m-l)/2).  Since  cp(Rj  p)  is  a  strictly  increasing  function 
of  R  over  the  interval  (- (m-l)”^,l), 

P  fr*  *  R  *  r')  =  Pp{cp(r';  p)  *  cp(R;  p)  *  «p(rM;  p)} 

=  ^p«  pj(n/2,n(m-l)/2)  -  (4.10) 

p)(“/2*n(m,l)/2), 

for  any  -(m-l)_1£  r'  £  r"  S  1;  where  I  (p, q)  is  the  incomplete  beta 
function. 

To  evaluate  the  coverage  probability  (4.6)  exactly  one  needs  to  integrate 
(4.8)  with  respect  to  the  distribution  of  R  over  (-(m-l)”\l).  We 
approximate  this  integral  by  partitioning  the  interval  (-(m-l)~\l)  into 
a  large  number,  k,  of  subintervals  (r^  ^,r. ),  i=l,2, ...,k;  where 
Tq  =  -(m-l)  "‘j  r^  =  1.  The  CP  function  is  approximated  then  by 

k  #  # 

CP  ~  ^  [Pos(mn/2  -  1  |  X^^  j  p))  -  Pos(mn/2  -  1  |  X  (^  j  p))].  (4.1l) 

[1<p(rij  p)(n/2,n(m-l)/2)  -  1^  ^ p)(n/2,n(m-l)/2)] 

where  r^  is  suitably  chosen  in  (r^  ^rj.  *n  Table  1  and  Table  2  we 
present  the  CP  values  of  the  MLE  and  Modified- KLS  intervals.  The 
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computations  were  performed  with  k=100  equal  size  subintervals  with  r^  = 
(r^  +  r^  -^)/2.  The  incomplete  beta  function  was  approximated  using  a 
Fourier  series  expansion  (see  Woods  and  Posten  [10]). 

The  determination  of  the  expected  length  of  the  MLE  based  intervals 
proceeds  along  similar  lines.  Let  L(R, y)  designate  the  length  of  such  an 
interval  having  nominal  coverage  probability  y,  i.e. 


I.(R,y)  =  a  [  1 


by*(r) 


\  (K) 


(4.12) 


The  conditional  expectation  of  L(R, y),  given  R,  is 


E{L(R,y)  1  R)  =  a 


2  mn 


n  f  1  _  1  1 

777  L  BV*(H)  A  *00  J  ' 


(4.13) 


Finally,  the  expected  length  of  the  MLE  based  intervals  is  approximated. 


similar  to  (4.1l),  by 


E{L(R, y) )  =  a2  mn  S 

i=l 


A  (r.  )  -  B„  (r.  ) 
y  i  Y'i 

}  p)  B  (r±  )A  (rt  ) 


(4.14) 


[T  /n  n(rr^l)  \  T  ,n  n(m-l)Q 

P )  2*  2  '  "  p)(2>  2  }J 


Expected  lengths  of  MLE  based  intervals  for  a  =  1,  m  =  2,  and  varying 
values  of  n,  p  and  y  are  given  in  Table  3. 


5.  ESTIMATED  - p  INTERVALS 


When  p  is  known  one  can  construct,  on  the  basis  of  the  statistic 
Q(p),  a  system  of  UMAI  confidence  intervals  for  a  at  level  (l-a  )• 


These  intervals  are  of  the  form 
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siLel. 


Q(p) 


(5-1) 


h  £  XC  [mn] 

i-£l  e2 


where  *^»€2  are  determined,  so  that  +  e0  =  a  and  an  additional  condition 
is  satisfied  (see  Lehmann  [4]).  In  practice,  equal  tail  intervals  tire 
employed.  Estimated  -p  type  intervals  are  obtained  from  (5>l)  by 
substituting  suitable  estimators  of  p.  If  the  MLE  p  is  substituted,  one 
obtains  the  UMAI  confidence  intervals  for  the  case  of  p  =  0  (known).  As 
shown  later,  this  results  in  a  loss  of  coverage  probability  when  Ip  1  is 
close  to  1.  Another  approach  is  based  on  the  Bonferroni  inequality 


?  2  Tl 

p'°  1-p2 


2  Ua^4'  s  p  s  j  1  1 


-  a  (5-2) 


2 

for  all  (o  ,p),  where 


A2 

mno 


a 


*  Ua 


A2 

mno 


(5-3) 


xl-a/U  tmn]  \x/h[mn] 

and  Pa  are  the  lower  and  upper  (l-  a/2)  confidence  limits  for  p 

as  obtained  from  (3.7).  Consider  the  function 


f(p  ;  p)  =  (1  -  p  p)/(l  -  p2),  -(m-l)"1  <  p  <  1. 


(5-U) 


The  function  f(p  ;  p)  is  convex  in  p,  lijp  f(p  j  p)  = 


lim  ,  f(p  i  p) 

p-»  -(m-l) 


m 


(ro-l)(m-l»p)  ,  m  >  3 
m(m-2) 


and  attains  the  minimum  of  fQ(p)  =  &(1  +  (1-p’  )'a)  at  Po=  (1  -  (1-p*  )^)/p. 
Note  that  if  p  =  0,  Po  =  0.  Furthermore,  if  p  a  0  then  p^  *  p,  and 
vice  versa.  Inequality  (5*2)  prescribes  a  (1-nr)  level  simultaneous 
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^2  2 

confidence  region,  Ca(o  ,  p)>  for  (°  *  p)«  Hie  boundaries  of  this 
region  as  functions  of  p  are  (see  Figure  l) 


^(P)  “  I*af(p  ;  p), 

b2^p^  =  Uaf^P  •  p^» 

Vp)  =  jBa> 

Vp)  "  pa. 


(5-  5) 


2 

A  confidence  interval  for  o  ,  called  the  BF- interval,  can  be  obtained  by- 
projecting  Ca(o2,  P)  on  the  o2- axis.  This  projection  yields  the  upper 
confidence  limit 


Ua  f(pa  J  p^»  P  *  0 

Ua  f(4*  »  p>'  P  <  0 


(5-6) 


and  the  lower  confidence  limit 


2<x  -  La  ga  (p)  (5-7) 

where 

(A)  C «io.  1  'o’  »  P>  *  <  Po’fo<P)-  P  *  0  <5-8) 

V.i(Pq  1  Po1  fo(P*  *  I(Pa  <  'o’  *»„  >  P><  P  <  0 
and  I{A)  is  the  indicator  function  of  the  set  A.  Notice  that  the 
intervals  of  o2  values,  (b^pjjb  (p))  and  (1^(0), bg(o))  coincide. 
Furthermore,  the  BF-interval  (g'a,  °a)  contains  (b^(o),b2(0)). 

The  BF- intervals  are,  however,  generally  too  conservative  in  the  sense  that 
their  coverage  probabilities  are  larger  than  (l  -a).  Moreover,  the  BF- 
intervals  could  be  considerably  skewed  around  a  in  cases  of  small  n 
and  | p |  close  to  1.  In  Table  4  we  present  a  few  simulation  estimates  of 


the  coverage  probabilities  (CP)  and  the  expected  length  (EL)  of  the 

(non- modified)  MLE  intervals  and  the  BF-intervals  x*or  1  -a  =  .90,  m  =  2, 

n  =  10  and  several  values  of  p.  Hiese  simulation  estimates  are  based 

2 

on  100  independent  and  identical  replicas  for  the  case  a  =1. 

We  remark  that  the  CP  and  the  EL  of  these  intervals  can  be  determined 
exactly  according  to  the  method  discussed  in  the  previous  section.  However, 
a  small  scale  simulation  was  employed  for  the  purpose  of  obtaining  preliminary 
estimates  only.  Hie  above  estimates  indicate  that,  indeed,  the  BF- interval 
provides  higher  coverage  than  desired  and  is  very  inefficient  compared  to 
the  MLE  intervals  with  respect  to  the  expected  length.  An  extensive 
evaluation  of  the  relative  efficiency  of  the  MLE  intervals  will  be  given 
later. 


6.  NAIVE  INTERVALS  FOR  o  AT  m  =  2. 


We  discuss  in  the  present  section  another  attempt  to  derive  a  system 

2 

of  confidence  intervals  for  o  ,  which  may  yield  in  small  samples  and  for 

values  of  ] p |  close  to  1  more  efficient  results.  Hie  investigation  is 

restricted  to  the  case  of  m  =  2.  Accordingly,  we  consider  n  i.i.d. 

vectors  (X.^,  X^),  i=l n,  having  a  joint  bivariate  normal 

2 

distribution,  with  zero  means,  common  variance  o  and  correlation  p. 

ut  si = k  s*  ■  k  x'i*  and  siz = k  xu  x2i- 

(Si+Sg, Si^)  is  a  minimal  sufficient  statistic.  A  confidence  interval  for 
2 

o  ,  at  level  l-a/2,  based  on  Sj  only,  j  =  1, 2  is  given  by 
2  2 

(Sj/Xi  a^[n],  VXaA(n]).  From  the  Bonferron?  inequality  we  imply  that 
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the  intersection  (if  not  empty)  of  these  two  intervals  is  a  confidence 
2 

interval  for  o  at  level  not  less  than  (l  -a).  In  other  words,  we 
consider  the  interval 


max  (Sj,  Sg)  min  (S^Sg) 


*  l-a/4^  *  a/4l 

The  interval  (6.l)  is  empty,  whenever  max  (S-^Sg)  *  (\  1-a^[n]/x  a/^[n]). 
min  (s^,Sg).  The  probability  of  this  event  is  generally  negligible.  Even 
in  cases  of  p  «  0  and  a  «,20  the  probability  that  (6.l)  is  empty  does 
not  exceed  .01  for  all  n.  (The  computation  of  this  probability  is  based  on 
a  formula  given  in  Johnson  &  Kotz  [2]  p.222).  For  completeness  we  define  the 
interval  estimator  when  (6.1)  is  empty  to  be  [n],  ^/xj/otn]).  Since 

6-1)  is  not  a  function  of  the  minimal  sufficient  statistic  its  efficiency  may 
be  improved.  There  is  reason  to  expect  it  to  be  efficient,  however,  when 
|p|  «  1.  We  therefore  develop  a  formula  for  determining  the  expected  length  of 
(6.1)  in  order  to  compare  its  efficiency  with  that  of  the  MLE  based  intervals 
given  by($.4). 

We  develop  an  approximate  formula  for  the  expected  length  of  the 

interval  estimator,  disregarding  the  event  that  (6.l)  is  empty.  This 

_2 

approximation  may  lead  to  an  error  which  is  bounded  by  10  J  in  the  cases 
under  consideration. 

Let  V  •-  min  (S^Sg)  and  U  =  max  (S^  Sg).  Since  U  +  V  =  S1  +  Sg 
the  expected  length  of  (6.l)  is 


,  [n] 


(6.1) 


EL  ,2n  o:^.E{UJ 


E(U) 


a/4 


[nj 


x'T.a/u[n) 


(C.2) 
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-  2p-°f -  .  (.  1 -  *  ___i - 'N  E(UJ 

*a/4tnl  \*^W  *l-a/l >M  J 

2  2 

Let  F(T\j  p,  o  )  be  the  c.d.f.  of  U  under  (o  ,  p).  It  can  be  shown  that 


(see  Johnson  *  Kotz  [2]) 


F(%  p,o2)  =  P  ZfSj.  H  s2  *  1]] 

-  jfb  e(j|p2>  f>  [P(x2(n  +  2j]  s  T)|(l  -  p2)}  ]2 


(6.3) 


where 


g(j|p‘  ,  -r!) 


-  ,(?  *  J)  p::J(i-  p")n/",j  =  0,1,... 

r(|)  r(j  +  1) 


(6.14) 


is  the  p.  d.  f.  of  a  negative-binominal  distribution,  Hie  expected  value  of 

00 

U  can  be  obtained  by  evaluating  J  [1  -  F(TJj  p,o2))  dl).  After  several 

o 

manipulations  one  obtains  for  even  sample  size  n  the  formula. 

»  |  +  j 

E(U)  -  2n-2(l  -  p2)  ^  g(j|p2,  |)  *  ^  G(f  +  i  -1  |  i  k),  (6.5) 
where  G(*|Y,v)  designates  the  c.d.f.  corresponding  to  g(*|Y,v). 


7.  THE  LOSS  OF  COVERAGE  IN  ASSUMING  p  =  0. 

Suppose  the  investigator  ignores  the  possibility  of  correlation. 

Under  the  assumption  p  =  0  a  confidence  interval  for  a°  could  be  obtained 
according  to  the  distribution  of  which  is  like  that  of  o^fmn]. 

The  upper  and  lower  limits,  respectively,  of  a  1  -a  confidence  interval  are 


Ql  +  S> 

X2  /Jmn] 


and  S.  *  _  . 

.2  r _ i 


(7.1) 


However,  if  the  assumption  p  =  0  is  incorrect  the  nominal  coverage 
probability  is  unlikely  to  be  attained.  For  the  case  where  p  is  close  to 
zero,  actual  coverage  will  be  close  to  the  nominal  and  thus  the  ''cost  of 
ignorance"  is  not  great.  But,  for  |p|  near  unity  actual  coverage  may  be 
considerably  less  than  the  nominal  resulting  in  substantial  loss  of 
coverage.  In  order  to  investigate  the  extent  of  loss  in  coverage  probability 
when  p  is  not  zero  we  computed  the  actual  coverage  for  p=  .1,  .5,  and  .9 
under  varying  nominal  coverage,  y,  and  different  m  and  n. 

The  actual  coverage  probability,  AC,  of  the  interval  (7.l)  is 
determined  in  a  manner  similar  to  that  of  determining  the  CP  values  of  the 
MLE  based  intervals.  Thus,  for  even  mn. 


where 


l-a/2l 


2 

x  a/2 


=  pp/V^_  *  a2  *  S.  +  ^  ^  (7.2) 

\j2l-a/2tmn]  *2a/2  Cmn]  J 

=  E(pA(R;p)x2/2[mn]  *x2[mn]  £  Y(R,  p)  |  A  } 

L  mn  mn  J 

S  [P°S  (p  "  1  1  1)l(rii  P>)  '  Pos  (t  "  1  I  Vpi'  ‘ 

I 2  }  "  I«p(ri_1ip)(2»  2 


=  ^  [fos  -  1  I  Y*V  p)^  -  Pos  ^  -  1  I  \(r*i  p) 


Vr*  ^  *l/2[nn]  and 

Vr>  p)  xL/2M* 

AC  values  of  (7-l)  are  given  in  Table  5. 


8.  THE  RELATIVE  EFFICIENCY  OF  INTERVAL  ESTIMATORS 


Hie  efficiency  of  interval  estimators  is  a  function  of  two  components: 
their  coverage  probabilities  (compared  to  the  nominal  ones)  and  their 
expected  length.  If  p  is  known,  the  UMAI  confidence  intervals  (5«l)  are 
most  efficient  in  the  class  of  all  location  invariant  and  scale  preserving 
confidence  intervals,  in  the  sense  that  among  all  such  intervals  having 
at  least  Y  coverage  the  UMAI  intervals  have  minimal  expected  length. 
However,  when  p  is  unknown  UMAI  confidence  intervals  do  not  exist. 
Asymptotically,  for  large  samples,  the  MLE  based  intervals  are  most 
efficient  in  a  wide  class  of  intervals.  However,  as  seen  in  the  tables, 
for  relatively  small  samples  the  MLE  based  intervals  are  not  always  better 
tli an  other  competitors.  In  order  to  compare  alternative  interval  estimators, 
especially  in  small  samples  cases,  we  propose  the  following  measure  of 
relative  efficiency 

re  =  0  *  t-1)/)*  .  (8.1) 

where  y  is  the  actual  coverage  probability  of  the  interval  estimator, 

D<V)  =  (X(l-Y)/2[nmD~1  *  ^  (i+'y) (8.2) 
and  EL  is  its  expected  length.  Notice  that  E{Q(p)]D(y)  is  Uie  expected 
length  of  (5.l)  having  the  same  coverage  probability,  y,  as  the  interval 
estimator  under  consideration.  Finally  since 


lim  EfL(R.y))  =  (1  «■  (m-l)p2)* 

n-*»  e1q(p)}P(y) 


the  R.H.S.  of  (8.2)  was  introduced  in  the  numerator  of  (8.l)  in  order 
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to  provide  a  fair  comparison.  In  Table  6  and  Table  7  the  relative  efficiency 
values  of  the  MLE  and  Modified  MLE  intervals  are  presented  for  various 
values  of  p,  y,  n  and  m. 


9-  DISCUSSION  AND  CONCLUSION 

In  Table  1  we  see  that  the  coverage  probabilities  of  the  MLE  intervals 
are  uniformly  close  to  the  nominal  ones.  On  the  other  hand,  as  shown  in 
Table  2,  the  Modified-MLE  intervals  do  exhibit  deficient  coverage,  especially 
for  large  p  values  and  small  n.  This  deficiency  is  due  to  the 
modification  which  leads  to  shorter  intervals.  The  loss  in  coverage  that 
may  result  by  ignoring  the  possibility  of  correlation  is  shown  in  Table  5 
to  be  rather  pronounced  when  p  approaches  1  and  m  increases.  For 
fixed  m  and  p  this  loss  of  coverage  is  insensitive  to  n.  The  "naive" 
confidence  interval  (6.l)  was  designed  to  have  coverage  probability  not 
less  than  the  nominal  one.  It  is  therefore  of  interest  to  compare  its 
expected  length  to  that  of  the  MLE  estimators.  This  comparison  is  provided 
in  Table  3.  We  see  that  for  small  n  values  and  p  =  .9  the  "naive" 
confidence  interval  may  have  a  smaller  expected  length  than  the  MLE  interval. 
Such  cases  show  areas  of  possible  improvement  over  the  MLE  intervals.  In 
Table  6  and  Table  7  the  relative  efficiency,  (8.l),  of  the  MLE  and  Modified- 
MLE  intervals  is  given.  We  see  first  that  for  small  sample  size  (n=10) 
and  large  y,  the  MLE  interval  may  have  infinite  expected  length.  Further¬ 
more,  the  relative  efficiency  of  the  Modified-MLE  intervals  is  very  close  to 
1  when  p  =  .1  and  decreases  slowly  as  p  approaches  1.  The  Modified-MLE 
intervals  are  considerably  more  efficient  than  +he  MLE  intervals,  despite 
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the  fact  that  there  is  some  loss  in  coverage. 

In  conclusion,  confidence  intervals  ignoring  the  possibility  of 
correlation  should  not  be  used,  The  estimated- p  intervals  discussed  in 
Section  5  seem  also  to  be  inefficient.  We  do  recommend  the  use  of  the 
Modified-MLE  intervals  (3.4).  Although  they  may  have  some  coverage  deficiency 
in  small  samples  these  intervals  are  highly  efficient.  Finally,  although 
the  various  tables  present  results  for  positive  p  values,  similar  results 
would  be  obtained  for  negative  values  of  p. 
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TABLE  4  -  Simulation  estimates  of  coverage  probability 
(CP)  and  expected  length  (EL)  of  the  MLH  and 
BE  intervals  for  1-Ql  «.90,  n«10,  m-2,  100  replicas. 
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Coverage  probabilities  of  confidence  intervals  assuming  p=0. 
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